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Abstract. For a linearly recurrent sequence Pn+i = A{n)-P„, consider 
the problem of calculating either the n-th term P„ or ^ < n arbitrary 
terms P„j , . . . P„j , both for the case of constant coefficients A{n) = A 
and for a matrix A{N) with entries polynomial in A*". 
We improve and extend known algorithms for this problem and present 
new applications for it. Specifically it turns out that for instance 

• any family (p,i) of classical orthogonal polynomials admits evaluation 
at given x within O(y^-logn) operations independent of the family 
(pn) under consideration. 

• For any £ indices ni, . . . ,ne < n, the values p„. (x) can be calculated 
simultaneously using 0{^/n ■ log n + £■ log j) arithmetic operations; 
again this running time bound holds uniformly. 

• Every hypergeometric (or, more generally, holonomic) function ad- 
mits approximate evaluation up to absolute error e > within 



C'(y log i • loglog i) — as opposed to ©(log i) — arithmetic steps. 
Given m € N and a polynomial p of degree d over a field of character- 
istic zero, the coefficient of to term X" can be computed within 
0(^d ■ M{y/n)) steps where M{n) denotes the cost of multiplying 
two degree-n polynomials. 

The same time bound holds for the joint calculation of any £ < ^Jn 
desired coefficients of to terms ni, . . . ,n£ < n. 



1 Introduction 

The naive way of calculating the n-th factorial Pn+i = {n+l)-Pn uses 0{n) arith- 
metic operations over Z. During its course, all lower factorials 1, 2, 3!, . . . , (n— 1)! 
are generated as well which might or might not be desirable. In the latter case, 
most of the intermediate factorials can in fact be bypassed and n! itself be 
calculated using only 0{^/n ■ logn • loglog n) integer operations. This had been 
observed by Strassen [23 Abschnitt 6] and is based on fast fourier transforms 
and polynomial multipoint evaluation. A generalization to the computation of 
the n-th element P„ of a recursively defined sequence of vectors 

P„+i = A{n) ■ P„ (1) 



* supported by DFG project Zil009/1-1 



2 Martin Ziegler 



with a matrix A of polynomials in n has been suggested in Section 6] , further 
improved in 3 and extended in |2I Theorem 5] to the joint computation of 
several (say, £, not necessarily consecutive) vectors P„ . . This result has yielded 
better upper complexity bounds for deterministic integer factorization and for 
computation with hyperelliptic curves |3I4| . 

The present work reveals the fast multi-evaluation of linearly recurrent se- 
quences to be in fact fundamental for several other problems as well; specifically 
to the evaluation of orthogonal polynomials and to the computation of specific 
coefficients of very high degree polynomials. Efficient handling of polynomials is 
itself a basic ingredient to many fast algorithms with a vast range of applica- 
tions and, as a matter of fact, plays in turn a major role in the fast evaluation 
of recurrent sequences. 

We first review and extend the previously known algorithms for linearly recur- 
rent sequences with both constant and with polynomial coefficients (Section O. 
These are then applied to other problems as follows: SectionOdcduces a roughly 
radial upper algebraic complexity bound ^7, uniformly on all orthogonal polyno- 
mials; and Section 21 presents computer algebra algorithms ^H] for determining 
specific coefficients of polynomials. This is of particular benefit in cases where 
the result has high degree n but only few (say, £ n) terms are desired at 
output-sensitive cost. 

In fact, all obtained running times are optimal with respect to £ in the follow- 
ing sense: As £ ^ n (that is towards the classical case of computing all entries 
of the sequence or coefficients of the polynomial) and with other parameters 
fixed, it converges to the asymptotic running time of the respective best known 
classical algorithm including logarithmic factors. Our new algorithms are thus 
true generalizations of the latter at cost increased by at most constant factors. 

2 Fast Evaluation of Linear Recurrences 

Recurrence equations like are ubiquitous in mathematics as well as computer 
science. Many (if not most) have no closed-form solution; and even if one does, 
it might not induce an efficient algorithm — compare n! above. 

In order to explicitly calculate the n-th term Pn, the naive approach sug- 
gested by Equation iteratively proceeds from Pq to Pi, P2, ■ ■ ■ , Pn-i,Pn 
and thus has running time proportional to n. However, being interested in P„ 
only, this might be out-performed by other methods which avoid computing all 
intermediate terms. For instance if fc x fc-matrix A does not depend on N, then 
repeated squaring yields within 0{k^ ■ logn) steps. This is already optimal 
with respect to n 7,, Theorem 13.14]; whereas in terms of fc, the running time 
has been further improvemed in '12' to time 0{k ■ polylog fc • log n) for computing 
the solution to 

Pn = ai ■ Pn-l + a2 • P„-2 + . . . + Cfe • P„_fe . (2) 

Notice that (0) is indeed a special case of with constant matrix A of com- 
panion form (jSJ and P„ := (P„, P„_i, . . . , P„_fc+i)^ 
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Theorem 1. Let TZ denote a commutative ring with 1 supporting multiplication 
of two polynomials of degree < n at cost at most M(n) > n. 

a) Given ai, . . . , Ok (z TZ, Pq, . . . , Pk-i G TZ, and rt G N, £ consecutive elements 
Pn, . . . , Pn+e^i defined by 0) can be computed using 0(^AI{k) ■ log n + £ ■ k"^ 
arithmetic operations in TZ. 

b) Given a constant companion matrix A G -j^kxk ^ j^k ^ vectors 
-Pnij-Pria; ■ ■ -f^f defined by (0) can be computed simultaneously using 
0{(, ■ k'^"^ ■ log ^) arithmetic operations in TZ where k < £ and ni < 712 < 
. . . < Ui n. 

Here, uj > 2 denotes any feasible exponent for matrix multiplication \1'M7\ : e.g., 
uj — 2.38. For £ > fc^, one may even choose uj = 2.34. 

Proof. Claim a) for £ = 1 is ^ Proposition 3.2]. Specifically, P„ is obtained 
as the calar product of (Pq, . . . ,Pk-i) with the coefficients of the polynomial 
X" mod / where f = ~ [ai + + • • • + OkX^^^) |T21 Theorem 3.1]. 
Therefore, once X" mod / is known, we can calculate X"+^ mod / = (X" mod 
/) mod / and P„+i using an additional number 0{k) of operations. Iteration 
thus establishes the case £ > 1. 

For Claim b), use JT21 Proposition 2.4] to compute all binary powers of A 
up to n within 0[k^ -logn). Therefore, each Pm is the product of Pq with J := 
C(logn) of these pre-calculated matrices A'^\ j = 1, . . . , J. In order to improve 
the induced naive running time oi 0{£ ■ k^ ■ logn) for the joint computation of 
Pni , ■ ■ • , Pni , batch the matrix- vector products into matrix-matrix products as 
follows: For each j = 0, . . . , J, collect alH = 1, . . . , £ for which P„. involves A^' in 
the above mentioned product; put the corresponding vectors to be multiplied to 
A?' as columns into a, k x £ matrix and multiply that to A"^^ using O^k'^ ■ [|]) 
operations. Here, uj — 2.38 is feasible due to JOI; alternatively, one can use 
0{k'^ ■ \^]) operations with cD = 3.34 TB*. Since £ > k (or £ > k^), this yields 
running time 0(^£ ■ k'^~^ ■ logn). 

More careful analysis reveals that it suffices to multiply only one vector 
(namely Pq) to A^ in time 0{k'^); and two vectors (namely Po and A^ ■ Pq) 
to A^ in twice the time; and, similarly on, to multiply in phase no.j only 2^ 
vectors to A"^^ as long as j < log(fc)/ log(tj — 2) with 0{k^~^) vectors mul- 
tiplied using a total of 0{k'^) operations dominated by the last phase. From 
j > log(fc) / log(a' — 2) on, switch to fast matrix multiplication. For j < log k, 
this involves only one bach and will thus take time 0{k'^) per phase, that is, a 
total of 0[k'^ ■ log fc) . Each phase no.j with log k < j < \og£ gives rise to 2^ vec- 
tors multiplied to A"^^ , grouped to 2^ /k batches and therefore taking 0(2^ ■k'^~^) 
operations, again dominated by the last one with duration 0(^-fc"~^). The final 
phases j = log £ . . . log n do not further increase the number of vectors multiplied 
to A"^^ because we are looking for only £ different results P„j , . . . , Pne ■ They 
thus induce total cost 0{^£ ■ fc"^^) each; times the number log j of final phases 
and added to the aforementioned O {k^ ■ log k) yields the claim. □ 
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For further improvement and regarding the very last paragraph of (12) . it seems 
worth while to attack the following 

Problem 2. Given ai, . . . , Ofc and Pq, . . . , Pk-i, compute Pk, . . . , P2k-i accord- 
ing to 10) in time o{k'^). 

Let us now relax the condition on A to be constant and consider matrices. . . 
2.1 ... with Polynomial Coefficients 

This case involves not matrix powers but matrix factorials like A{n) ■ A{n — 
\) ■ ■ ■ A{2) ■ A{\) —: W^j^i A{j). While the naive iterative approach leads to 
running time proportional to n, Chudnovsky&Chudnovsky have improved 
that to cost roughly radical in n 9, Section 6]: 

Fact 3 Let TZ denote a commutative ring permitting multiplication of two poly- 
nomials of degree < n at cost at most M(n) > n where M satisfies some 
standard regularity conditions bottom of p. 582]. Consider a k x k matrix 
A{N) with polynomial entries aij{N) G Ti[N] in N of degree < d. Given (the 
coefficients of) A and n € N, one can calculate the matrix 11^=1 ^(j) using 
0{k^ ■ AI{y/nd) ■ logn) operations in TZ. 

Proof. Let ly := \\/n\ and consider the Baby-Step/Giant-Step approach of 

i) determining the (coefficients of the) polynomial matrix C{N) A{N + v) ■ 
A{N + v -!)■■■ A{N + 2) • A{N + 1) e 7^[iV]'=^'=; 

ii) multi-evaluating C at 0, v, 2iy, . . . , \ n/i>\ ■ v =: h] 

iii) calculating IljLi ^(j) by iterative multiplication of the matrices C(0), C{v), 
. . . ,C{h — v) obtained in ii); 

iv) finally computing 0^=1 ^0) by iterative multiplication of the result from 
iii) with the matrices A{n), A{n + 1), . . . , A{n — 1). □ 

The possibility for further improvement of Fact|31to, say, ©(polylogn) for fixed 
{k, d) is unknown already in the case of the scalar factorial n! and related to 
deep open class separation problems in complexity theory |8I21| . For rings of 
characteristic and fixed d however, improvements in particular in terms of the 
size k of the matrix A have been obtained by Bostan&Gaudry&Schost jH 
Theorem 5] as well as a generalization to the simultaneous computation of 
i < 0(ni/2-f ) matrix factorials n"4i ^(j), i = l,...J. 

The present section reviews this result, presented with a new proof and in- 
cluding in its analysis the running time's dependence on the degree d of the 
polynomials in A as well as on the number £ of elements of the sequence to be 
computed non-trivially extended beyond ^/n (Theorem 01d). Further claims deal 
with a generalization (Theorem^) and improvements for the frequent case that 
A has companion form (Theorem 2): -|-d). In the sequel, capital letters X and N 
denote formal indeterminates of polynomials whereas lower case x and n refer 
to variables with values. 
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Theorem 4. Consider a k x k matrix A{N) with polynomial entries Oij (N) G 
n[N] of degree < d. 

a) Given (the coefficients of) A as well as i pairs of integers [mi,ni) with 
< rm < Ui, one can simultaneously calculate the £ matrix products Bi := 
n"=m, A{j), i^ !,...£, using 

operations in TZ where n :— maxi ni> d ■ log^ d. 

b) If nii = 1 and, instead of the matrices Bi themselves, the I matrix-vector 
products Pi = Bi ■ Pq for a given Pq G TZ'' are desired, this can be accom- 
plished using 

0(^k^ -mm {V^,nd/e} + k^-M{V^) + fc2 . ^ . m^M) 
operations in TZ. 

c) In case that the matrix A{n) is of companion form and invertible in TZ^^^ for 
all integers n exceeding a given m G N, then the £ vectors Bi-Po, i ~ 

can be computed using 

o(^e-M{V^) + k'.£.Mll^^ 

operations in TZ. 

d) If additionally n > k^ and the polynomials constituting A{N) obey the re- 
stricted degree condition deg(aij) < j, the running time further reduces to 

The algorithms are uniform and — except for the roots of unity exp{2TTi/n) 
employed in the FFT when M{n) = 0{n\ogn) — free of constants. 

2.2 Proof of Theorem gl 

Reconsider the proof of Fact O with its four steps, but leave the value of the 
trade-off parameter v open for the moment to be chosen later as an integral 
power of 2. We also remark that the coefficients of the polynomials arising in 
Steps i) and ii) may be taken with respect to any common (rather than the 
standard monomial) basis. As a matter of fact, regarding the hypothesis that 
n > dlog^ d, it pays off to first spend C'(A;2-M(ci)dogd) operations for converting 
A{N) to the falling factorial (also called Newton) basis ^ Section 4.2] because 
that will accelerate evaluation and interpolation on arithmetic progressions by 
a logarithmic factor Section 4.3]. Specifically exploit that evaluating a 
degree- 1? polynomial p simultaneously at K points of an arithmetic progression 
takes, by simulating \K/D'\ multipoint evaluations oi p at D — deg(p) points 
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each, 0{{^ + 1) • M (D)) operations 14, Theorem 4.24]. Step ii) thus succeeds 

within a total ol 0[k^ ■ + 1) • M{vd)) operations. 

Concerning Step i), Section 6] combines fast matrix multiplication with 
fast polynomial arithmetic and achieves running time 0(A;" • M{vd)\ogv). [3 
has observed that this allows for improvement, provided the characteristic of TZ 
is zero (or larger than m + vd). Their proof is a recursive descend on n being 
an integral power of 4 with a complicated consideration for the general case. We 
obtain a considerable simplification in particular in Sub-Steps a) and 7) below 
by working in the Newton rather than monomial basis: 

a) Perform separate multipoint evaluations to obtain the matrix values 
A{m + l),A{m + 2), . . . , A[m + 2vd) € VJ''^^ for arbitrary to g N. Since 
the evaluation points form an arithmetic progression this takes, similarly to 
Step ii), a total of 0[k^v ■ M{d)) operations. 

/?) Determine the matrices C{m), C{m + 1), . . . , C(m + vd — 1) E T^fexfe using 
0{k'^vd) analogously to 01 Proposition 2]. Specifically, compute the 2v 
products A{m + v) ■ A{m + — 1) • • • A{m + v — j + \) and A{m + v + j) ■ 
A{m + v+j — l) ■ ■ ■ A{m + v+\) for i = 0, 1, . . . , within 0{k'^v) and observe 
that each C{m), . . . , C(m + — 1) is composed of two such product ranges. 
[C(to + C{m + 2i^ - 1)] , . . . , [C(to +{d- l)v) , . . . , C(to + dv - l)] 

are obtained similarly. 

7) Interpolate the vd matrix values from Sub-Step /3) to determine the (coeffi- 
cients in the factorial basis of the) matrix polynomial C{N) of degree < vd 
at the expense of • M{vd)) operations IT, Theorem 4.26]. 

Since M{vd) < vM{d), the asymptotic cost of Step ii) above exceeds that of Sub- 
Step a). Step i) gives rise to an additional running time oiO{k'^vd+k^M{vd)). 
Step iii) uses 0[{1 + j;) ■ k^) operations and Step iv) another 0{v ■ k^). 

If, instead of the matrix 0^=1 -^ii) itself, only the vector P„ = 0^=1 ^{i)'Po 
is to be calculated, we can replace the C'(A:'^)-time matrix-matrix products in 
Steps iii) and iv) with 0(fc^)-time matrix-vector products. If furthermore A{n) is 
in companion form and invertible for all integers n> m, also Step i/3) accelerates 
to 0{k^vd} by LemmalHt) below. 

Towards the multi-evaluation case £ > 1, suppose for a start that all rii and 
TOi are multiples of v. We thus seek an algorithm for the following step: 

v) Simultaneously calculate the £ matrices nj=mi+i ^U) (oi' their respective 
product with Pq) where fii :— \ ni/v\ ■ v and rhi := \mi/v~\ ■ v, i = 1, . . . , £. 

For Claims b+c) with fhi = 0, it suffices to iteratively multiply the matri- 
ces C{0),C{v),C{2v), . . . obtained in u): this yields all products 11^=1^0)) 
s — 1, . . . , \n/v\ =: I and takes 0{^k^) steps. For Claim a) with general 
fhi, we have to calculate i products IljLri matrices Cj := C{jv) where 

the ranges [ri,Si), i — !,...,£ may be arbitrary integer intervals contained in 
[0,/). To this end recall the Range Tree from Computational Geometry |21 
Section 5.1]. Specifically, consider the set S := {0,ri,Si : i = !,...,£} or- 
dered as S" = {0 = to < ^1 < ■•■ < ^Af-i) where N < min{2£ -I- 1,/}. 
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Now compute first the N products Y\t^[tj,tj+i)^t^ 3 ~ 0, ...iV — 1, invoking 
0{^ - \tj+i — tj — 1|) = 0{I) matrix multiplications; then compose from these 
results the N /2 products nte[t2j,t2j+2) *' ^ ~ 0,...,N/2 — 1 using further 
N/2 < 1/2 matrix multiplications; then the N /A products Jliefti (4 +4) 
so on. So after a total of 0{k'^I) operations, all products ranging over a binary 
interval are prepared which concludes the initialization of the Range Tree. Now 
for its application, observe that each interval [r;, .s;), z = 1, . . . , is a disjoint 
union of C'(logiV) < 0{\ogt) of these binary intervals. This concludes the en- 
tire Step v) within time 0{k'^{^ + aogt)) in case of Claim a) or 0{k'^^) for 
Claims b+c). 

For the final goal, that is to 
vi) simultaneously calculate the (. matrices 11^=™; ^(j) (o'^ their respective 

product with Pq), i = 1, . . . ,1, 
invoke the Range- Tree idea once again. This time, the initialization phase con- 
sists in preparing the (coefficients of the) v/2 polynomial matrices C^/2(-/V) := 
A{N +^)- A{N - ■ ■ A{N + 2)- A{N + 1) e 7^[iV]^x^ C,/i{N), C,/^{N), 

. . . , C2{N), Ci (N). Due to the exponentially decreasing size v, this will together 
infer only the same cost as Step i). 

In the application phase, first multi-evaluate C^/2{N) at those hi whose 
difference to is at least z//2 — ©(fc^ • (-^ + 1) • Ad{ud)) operations as in 
Step ii) — and multiply them to the already computed results from Step v) at 
the expense of another 0{k"-£) and 0{k'^-£) for Claims a) and b-|-c), respectively. 
For Claim a) do similarly for those rhi differing from by at least iy/2. Now 
repeat with multi-evaluating Cv/4,{N), then Cv/%{N) and so on. By the same 
argument as above, this will affect the overall running time by at most a factor 
of 2 while in the end yielding the desired resulting values. 



iii) 
-Mv) 

V) 

vi) 



Case a) 
k^vd + k'^M{vd) 

+£log£) 



Case b) 
k^vd^k^Mivd) 
k\^ + l)M{vd) 

el^^ + i)M{vd) 



Case c) 
k^vd + k^M{vd) 
fc2(^ + l)MM) 
k^{l + ^ + u) 
k^^ 



Case d) 
k^M(y + k) 



+ l)M{u + k) 



1/ 



Fig. 1. Big-Oh running times of steps i) to vi) in cases a) to d) 



It thus remains to confirm that the costs of the above steps i) to vi) are all 
covered by the running times claimed in a), b), and c). To this end, choose f 
as (an integral power of 2 close to but not exceeding) \fnfd if ^ < \fnd and 
V := n/i otherwise. We remark that u < \Jn/d holds in both cases, so 
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which amounts to 0{M{V^)) in case e < Vnd and to ■ M{^)) if £ > 

•s/wf; Claims a+c) are thus immediate. For Claim b) observe furthermore that 
vd = min{ ^/nd, nd/i}. 

Case d) admits, in addition to Case c), further improvement based on the ob- 
servation that the degree of the matrix polynomial(s) C{N) involved in Steps i), 
it), and vi) reduces from to + fc by virtue of Observation ^p) below. 
This yields the running times in the last column of Figure ^ Then choose 
V := y/n > k. □ 



2.3 A First Application and Some Tools 

Theorem 0]d) includes 4, Theorem 5] by restricting to ^ < 0{^/n) and con- 
stant d. Another consequence, we have the following non-trivial complexity in- 
terpolation between, on the one end, Strassen's aforementioned algorithm |27[ 
Abschnitt 6] computing one single factorial nl (that is, the case ^ = 1) and, 
on the other end, the obviously optimal naive iterative 0{n) calculation of 1, 2!, 
3!, . . . , n! (that is, the case £ = n): 

Corollary 5. OverU = Z with M{n) = O(nlognloglogn) O Theorem 8.23] 
and k = 1 = d, any I desired factorials nil < 712! < ... < ni\ = n\ can he com- 
puted simultaneously using O {^/n ■ log n ■ loglog n + l- log j ■ loglog j) arithmetic 
operations. 

Further applications will be given in the sequel. In many of them, the matrix A 
according to Equation Q is structured 23 • For example a companion matrix 
as well as its inverse 



F = 



( fi /2 /a 
1 
1 
1 

V 



/fc-1 fk\ 






1 0/ 



F-' = 



( 






-/l -h 

fk fk 



\ 





1 

-fk- 



(3) 



is described by the k parameters {fi, . . . , fk) as opposed to the k^ independent 
entries of a general matrix. Theorem ^ relies on fast powering of companion 
matrices, that is, on efficient calculation of iterated products of the same F. 
The following tool, employed in the proof of Theorem^), considers products of 
several companion matrices and might be of its own interest: 



Lemma 6. Given m companion matrices Fi 



.F„,en 



kxk 



a) their product Fi ■ ■ ■ Fm can be computed in 0{m ■ k^) steps 

b) as well as in 0{k'^ ■ (1 + f)) steps. 

c) If all Fi, . . . , F„i are invertible, then the m — n + I products 



Fi---Fn 



F,---F„ 



J- rr). — 



m— n+1 



■ F 



can he computed simultaneously in 0{m • k^) steps. 
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Proof, a) The multiplication of a vector to a companion matrix, from left F-v as 
well as its transposed from left v''^ ■ F, both takes 0{k) operations. Therefore 
the multiplication A- Fm by an arbitrary square matrix like A = Fi ■ ■ ■ Fm~i 
takes 0{k'^) steps. Iterating establishes the sought 0{m ■ k^) algorithm. 

c) Compute the first product Pi := Fi ■ ■ ■ Fn as in a). For the subsequent terms 
Pj+i '■— Fj^i ■ ■ ■ Fn+j — Fj'^ ■ Pj ■ Fn+j exploit as in a) that multiplication 
by Fn+j as well as by F~^ takes only 0{k^) steps. 

b) Recall QTl Lemma 3.1] the formula 



Fi-F2---Fk-i-Fk 



(I-L) 



R 



where R and L denote (respectively lower and strictly upper triangular) 
matrices plainly consisting of the k'^ joint parameters of Fi, . . . ,Fk. Both 
multiplication with R and the inverse (J — L)~^ are feasible within 0{k'^) 
[3 Proposition 16.6]. This establishes the case m = k; the general case 
now follows by partitioning m into [m/fc] blocks of length k each according 
to the grouping {Fi ■ ■ ■ Fk) ■ {Fk+i ■ ■ ■ F2k) (-Fm-fc+i • • • Fm)- □ 

The following improvement to Lemma seems conceivable: 

Problem 7. Given k companion matrices of size k x k, compute their product 
using 0{k^ ■ polylogfc) operations. 

Another ingredient to the proof of Theorem is the following 



Observation 8 a) Let A G TZ[X 



kxk 



id b G TZ[X]^ denote a matrix and 



vector of polynomials of deg{bj) < m — j and deg(aij) < 1 + j — i. with the 
convention deg(O) = — oo. Then, c := F ■ b has deg(cj) < m + 1 — j . 
h) Let Fi, . . . , F„i E TZ[X]''^'' denote polynomial companion matrices with {fn, fi2, 



• • • J fik) the first row of Fi, respectively. If deg{fij) < j, then B 
has deg(aij) < m + j — i. 



n; 



=1 



Ff 



deg(A) = 



/12 
1 




\0 



k-l k \ 

k-2k-l 



deg(6) = 



/ 



m — 1 
TO — 2 

m — fc + 1 
m ~ k 



Proof, a) is a straight-forward consequence from deg(p • q) < deg(p) + deg{q). 
b) follows by induction on to, applying a) to each column b oi B. □ 



3 Fast Evaluation of Orthogonal Polynomials 

This sections concludes from Theorem^]:) that any family of classical orthogonal 
polynomials has at most roughly radical complexity^ 0(^/n ■ logn). 

^ This is not to be confused with the Paterson&Stockmeyer Result |24l that every 
polynomial has complexity 0{^/n) when neglecting operations in the coefHcient field. 
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Horner's Method evaluates a fixed degree-n polynomial p e at given 

X within 0{n) arithmetic steps. While this is optimal in the 'generic' case |3 
Corollary 5.11], many specific polynomials do admit faster evaluation; mono- 
mials X" for instance in time O(logrt) by means of repeated squaring. Also 
Chebyshev's Polynomials r„ G Z[X] have complexity logarithmic in their de- 
gree; this can be seen either directly from the quadratic recurrence 

T2n{X) = 2T^{X) - 1, T2„+i(X) = 2T„+i(X)-r„(X) - X 

or by applying Theorem ^ to the linear vector recurrence 

/T„+i(X)\ ^ / r„(X) \ . ^ /^2a; -1\ 

\T^iX)J 'W-iiX)J' ■ {l Oj 

with matrix A independent of n ^ Section 4], [TS|. RecaU that (r„) forms an 
orthogonal system on [—1,-1-1] with respect to the weight p{x) = (1 — a;^)~^/^. 
Other weights lead to other families of orthogonal polynomials. They are a im- 
portant tool in Mathematical Physics due to their approximation properties |22| . 
The Legendre Polynomials P„(X) for instance are orthogonal on [—1,-1-1] with 
respect to p{x) = 1. 

Theorem 9. Every monic family C M.[X] of classical orthogonal polynomi- 
als has complexity 0{y/n ■ logn). Any I members Pm , . • . , Pne of such a family 
have joint complexity 0{^/n ■ logn -I- £ ■ log j). The constants in the big-Oh 
notation are independent of the family (Pn)- 

Observe that, as £ — > n (that is concerning the problem of evaluating all poly- 
nomials Pi{x), . . . , Pn{x)), the running time converges to 0{n) which is clearly 
optimal. 

Proof. It is well-known that any family (P„) of classical orthogonal polynomials 
satisfies a three-term recursion 



P„+l {X) = (A„ ■X + Br,)-Pn {X) - Cn ■ Pn-l{X) (4) 

see, e.g., |22l Section II. 6. 3]. In fact for monic (P„), A„,i?„,C„ have turned 
out as rational functions of n with respective numerator and denominator poly- 
nomials a{N),biN),c{N),aiN),(3{N),-f{N) e R[N] of degree at most 4 [SI 
Theorem 1]. Rewriting Equation we obtain 



Pn+i{X)\ ^ /a(n)/3(n)7(n) • X -a{n)P{n)c{n)\ / 
Pn{X) ) [ a{n)p{nh{n) \Pn-i{X) 



a recursion with polynomial coefficients of size k and degree d independent of 
the family (Pn) under consideration. Now apply Lemma llOl below with M(n) = 
0{n ■ log n). □ 
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Lemma 10. Let J- denote a field of characteristic permitting multiplication 
of two polynomials of degree < n at cost at most M{n). Let C !F[X]'' be a 

sequence of polynomial vectors satisfying 

s(n + l,X).P„+i(X) = A{n,X)-P,,{X) (5) 

with companion matrix polynomial A E J^lN, X]''^'' and s G TlNjX] both of 
(total) degree < d. Finally suppose that s{n,x) ^ for all n E N and all x E T , 
the latter denoting an arbitrary subset of T . 

Given x G J- , Pq{x) G , and (the order k^d^ coefficients of) both A and s, 
one can simultaneously evaluate Pnj^ix), . . . ,Pn^{x) using 

0{k^-M{^d) + fc2.^.Mn^^ 
arithmetic operations over T where maxjii'^, n^} < n. 

The multi-evaluation expressed above refers to the indices ni, . . . , of the se- 
quence and should not be confused with multipoint evaluation of a polynomial 
at several point xi, . . . , a;„ as, e.g., in 

Proof fLemma YTI^) . Let cr„(X) := Y\a=i ^^"^ consider the sequence Q„ := 

cr„ • P„ obviously satisfying Q„+i(X) = A{n,X) ■ Qn{X). After plugging in x 
into A using 0{k^d^) arithmetic operations, one arrives thus in the situation 
of Theorem 0J;). Indeed, if ai^k{N,x) G ^^[N] is the zero polynomial, then we 
may truncate both the last column and row of A and reduce the dimension k 
of the recurrence by one; whereas if ai^k{N,x) is not identically zero, it has 
only finitely many roots and A{n, x) is invertible for all n > m with some 
appropriate m which can easily be found using standard bounds. This yields the 
joint computation of Qm{x), . . . , Qmix) within the claimed time. Now exploit 
an+i{X) — s{n + 1,X) ■ cTniX) to similarly compute cr„j(x), . . . ,o'„j(a;). Since 
these are units by assumption, another k£ divisions yield the desired values 

Pn,{x) = Qni{x)/(Jn,{x). □ 

4 Fast Partial Polynomial Arithmetic 

The present section applies fast evaluation of linearly recurrent sequences to the 
problem of computing single or few specific coefficients of a polynomial of large 
degree. 

Based on FFT-methods, many algorithms have been devised which yield 
fast solutions to many problems in polynomial arithmetic ^1 Part II]. These 
tend to be optimal in running time up to poly-logarithmic factors, simply by 
comparison with the sizes of the input and output. However the operations of 

composition: given p,q E ^[X], determine po q; 
powering: given p G J-[X] and n G N, determine p"; 
inversion: given p G ^[X] with p{0) ^ and n G N, 
determine q := 1/p mod X" G -^[-'^]. 
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generate results of degree significantly larger than the input: quadratic in the 
first case, unbounded^ in the second and third. This leaves room for improved 
algorithms in cases where only one or few terms of the power or inverse are 
desired — preferably with output-sensitive running times proportional to the 
number of terms desired. For instance, 7, Corollary 2.3] accelerates polyno- 
mial multiplication when some coefficients of the result are already known. Our 
interest lies in situations where coefficients are not known nor of interest any- 
way, that is, in the partial calculation of polynomials. In this spirit, presents 
improved algorithms for computing the lowest £ coefficients of the result where £ 
coincides with the degree d of the input 7, Corollary 2.33, Theorem 2.34], 
for composition for instance in time 0{(P^^ ■ polylogd). Our result deals with 
determining either the £ most significant as well as arbitrary coefficients. 

Theorem 11. Let !F denote a field permitting multiplication of two polynomials 
of degree < n at cost at most M(n). 

a) Given p G .?^[^] of degree d and n G N, the £ > d most significant coefficients 
of the power p" G .?^[^] can be computed in time 0{^M{£) + log ^) . 

b) Given p G ^[X] of degree d with p{0) ^ and n G N, the £ most significant 
coefficients of q := 1/pmodX" can be computed in time 0(^M{£ ■ log j) • 
log j) where d < i < n. 

c) Let T have characteristic zero. Given m €z N, p €z ^[X] of degree d, and 
ni, . . . ,n£ G N, the coefficients of p™ to the terms X"-% i = !,...,£, can be 
computed simultaneously in time 

0{d^-i^ + l).M{V^)) 

where d < £ < n and ni, . . . ,ni < n and n> d^ . 

d) Given p G .?^[^] of degree d with p(0) ^ and n G N, the n-th to {n—l+£)- 
th coefficients of q := 1/p can be computed simultaneously in time 0(^M(d) ■ 
logn -I- £d) where d < n. 

Proof, a) is easy based on the observation that the £ top-most coefficients of 
p ■ q depend only on the £ top-most coeffients of both p and q. More formally, 
using the convenient notation of ^ Section 2], it holds 

bJdeg(,)-. - [rev(deg(p),p)r+\ \p ■ q^ ^ \\pY ■ UYX . 
and rev (deg(p) -)- deg(g),p • gr) = rev ( deg(p),p) • rev ( deg(g), g) (6) 

where rev (iV, X^^^o '*»^") S«=o "a^-"^" [L^^o '^"^"l'' •= 

YL^o anX", [ Y^^^o a„X"J := X]^=o an+eX'^ 

Now calculate first q := p^^'^ (w.l.o.g. £/d integral) of degree £ by repeated 
squaring within time 0{£); and then obtain from that the £ top- most co- 
efficients of g"''/^ as the £ least ones of rev (g""*/^) = rev{q)"'^^^ based on 
Equation © and ^ Corollary 2.33] within O (M {£) + log ^) . 



^ We refer to the algebraic size of course; in terms of the btt size of n, the output is of 
exponential degree — still too large. 
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b) Consider the classical Newton iteration 

q h-^ 2g - g2 j^oJ j;^2deg(9) j-y) 

which yields a sequence of 'approximations' qj of doubling degrees such that 
q = q mod X'^'^^'^^\ In particular q itself of degree n (w.l.o.g. a power of 2) 
is obtained after J := logrt iterations with the running time governed by 
the cost of the polynomial multiplications in Equation and thus dom- 
inated, due to the exponentially growing degree of q, by the last step [3 
Section 9.1]. 

Let us analyze Newton's iteration backwards regarding which coefficients of 
q = (jj's predecessors qj-i the £ top- most coefficients of q depend on. To 
this end observe that Equation ((TJ turns some qi of degree m first into the 
polynomial 2qi — qf ■ p of degree 2m -|- d and then cuts off its d top-most 
coefficients in order to obtain qi^i of degree 2m. By Equation the k 
top-most coefficients of qj-i having degree m thus depend on and can be 
computed in time ©(Af (fc)) from the k + d top-most terms of qj-i-i having 
degree m/2. In particular for the sought t top- most coefficients of q having 
degree n to be calculated efficiently, it suffices to know the £ + dl top-most 
ones of qj-i having degree as long as £ + dl < n/2^ . Since d < £, the 
algorithm may choose / : [log j — loglog j] and first perform the classical 
Newton iteration from qg to q,j-i; from this polynomial of degree ©(.^-log j) 
continue via steps J— (/— 1) to J, keeping at stage no. (J—i) only the £ + di 
top-most coefficients. This yields the claimed overall running time, 
d) Recall Leibniz' Rule for higher derivatives of a product 

= j:(fj-f^'^-9^-'^ . (8) 
fe=0 ^ ^ 

Applied to f := p and g := l/{nl ■ p), we obtain for n> d> 1: 
1 \(") , n.^ I 1 \ 



0^ - E*-.'"-G 



nl-pJ ko \{n — ky.-p 

since p'*'^ = for A: > d = deg(p). Evaluated at a: = and normalized, this 
constitutes a linear recurrence like (O of depth d—1 with constant coefficients 

■■= a recurrence p.41 Eq.(*)] for P„ (^)<"^(0), that is, the 

n-th coefficient of q — l/p—. X]^o PnX"-. Now use Theorem^), 
c) W.l.o.g. j((0) ^ 0, otherwise consider p/X. Apply Equation (IHl) to := 
^pm-i-i^(") ways^ where D : p i-^ p' denotes the differential operator: 



n — 1 



fc=i 



k=0 



inspired by EH p. 134] 
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because derivatives of p higher than d vanish. Equating right sides yields 

. ■' 

= :ak(n) 

which, evaluated at a; = and normalized by p(0), establishes a linear recur- 
rence for Pn := l?"p"'(0), that is, the n-th coefficient of (up to a factor 
n\). This recurrence has depth d and involves coefficients polynomial in n of 
deg(afc) < k. Now apply Theorem^). □ 

Problem 12. Does that is the concatenation of powering and inversion, 

also admit fast partial computation? 

The related question concerning the product of powered and inverted polynomi- 
als is the subject of the following section: 

5 Closure Properties 

Classical algorithms for fast polynomial arithmetic have all significant coefficients 
as input and output; they are thus obviously closed under composition and 
can be combined to solve more advanced problems |2f)j . For partial polynomial 
arithmetic, on the other hand, the output of two algorithms calculating few 
coefficients of two respective high-degree polynomials p and q cannot simply be 
fed into a third algorithm in order to obtain merely one coefficient of, say, the 
product p ■ q. Instead, we refer to the framework of 

5.1 Holonomic Functions and Recurrences 

Definition 13. A function f{x) of one variable x is holonomic of depth k if it 
satisfies a linear ordinary differential equation of order k 

ao{x)-f<^''\x)+ai{x)-f'^''-'\x) + --- + ak-i{x)-f{x)+ak{x)-f{x) = Vx 

where the Oi are requred to be polynomials. 

A sequence {Pn)„ in the field T is holonomic of depth k and degree d if it 
satisfies a linear recurrence 

aoin) ■ Pn+k + ai{n) ■ Pn+k-i + • • • + ak^i{n) ■ P^+i + ak{n) • P„ = (10) 

for all n Cz N where ai G J^[N] must be polynomials of degree at most d. 

By Theorem 01 holonomic sequences admit multi-evaluation in roughly radical 
time. This was exploited in Theorem II Ifc-l-d^ whose proof reveals the following 

Example I4. Let p G ^[X] denote a polynomial of degree d with p{0) ^ 0. 
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a) The sequence of coefficients of 1/p is holonomic of depth d and degree (i.e., 
with constant coefficients). 

b) For arbitrary n g N, the (finite) sequence of coefficients of is holonomic 
of depth d + 1 and degree d. 

It is known that a power series represents a holonomic function iff its coefficients 
form a holonomic sequence; see e.g. j20l , p. 3]. The vast and important classes of 
hypergeometric TS*. Section 5.5] and generalized hypergeometric functions [221 
for instance strictly include the holonomic ones. Theorem01also yields a roughly 
quadratic acceleration for their approximation: 

Corollary 15. Fix a real or complex holonomic power series f(x) — X]^o ^nX^ ■ 
Then the polynomial given by f 's first N terms, that is, pn{x) := '^n=Q '^riX^\ 
can be evaluated at given x using 0{\/N ■ log A^) arithmetic operations. 

In particular suppose f has radius of convergence R — 1/ lim sup^^^^^ ^|c„| > 
and fix < r < R; then, given \x\ < r, one can approximate f{x) up to pre- 
scribed absolute error e > within 0( 



log 7 • loglogi) steps. 

Proof. Let (|10ll denote the holonomic recurrence satisfied by (c„)^^ ; for simplicity 
with leading coefficient oq = 1 — otherwise rescale as in the proof of Lemma [TUl 
Then the sequence of values {pn{x)) satisfies 



/ Pn{x) \ 

CN+1 

cn 

\CAr_fe+i / 



/I 
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■ ^\ 
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■ aa 
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/ Pn-i{x)\ 
Cn 



J V CN-k J 



that is, a recurrence of the form and thus supporting evaluation in the claimed 
time by virtue of Theorem 0] To choose iV, fix p G {r,R). Then |c„| < M ■ 
for all n with some appropriate Af g N — Cauchy's Estimate. Thus f{x) differs 
from ppf by at most 



n>N 







M ■ 



















N 



N 
p 



which drops below e for some N < 0( log - ^ 



□ 



5.2 Product of Fast Partially Computable Polynomials 

The class of holonomic sequences is closed under addition, multiplication, and 
convolution [251 Theorem 2.1]. Careful inspection of the latter proofs reveals 
bounds not only on the resulting depth but also on its degree. 

Proposition 16. Let (Pn) and (Qn) denote two holonomic sequences of degree 
d and depths k and I, respectively. Then 
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a) their sum {Pn+Qn)„ is holonomic of depth K < k+£ and degree D < {k+tj^d; 

b) their product {PnQn)^ is holonomic of depth K < k£ and degree D < k^t^d; 

c) their convolution (X]m<n^m ' holonomic of depth K < k ■ £ 
and degree D < k^£^d. 

As the degree of the resuhing holonomic equations is closely related to the run- 
ning time of the Gaussian Elimination as the most expensive component in the 
algorithms in |25l Section 2.1], upper bounds on the complexity of the latter 
emerge by consequence of ProDOsition ll6l Our present interest however is closure 
under multiplication of fast partially computable polynomials: 

Corollary 17. Let T have characteristic 0. 

a) Given p,q (z of degrees at most d with q{0) ^ and given m G N, one 
can compute £ arbitrary coefficients of p™ ■ 1 /q within 

0[d'-MiV^) + d'-£-^^^) 

operations over T . 

b) Given pi,P2 € ^i^] of degrees at most d and given mi,m2 G one can 
compute £ arbitrary coefficients of p™^ • p™^ within 

o[d^-M{V^) + d'-£-^^^) 

operations over T . 

Problem 18. Theorem0Ji) yields improved multi-evaluation of holonomic recur- 
rences with coefficients of decaying degrees deg(aj) < j. We have already applied 
that in Theorem II Ih) for the partial computation of based on Equation 0. 
This raises the question whether also the product/convolution of two holonomic 
recurrences with decaying degrees is again one of decaying degree. 

Proof ( Corollary | j7| ). Since multiplication of polynomials corresponds to the 
convolution of their coefficient sequences, combine Proposition lltib ) with Ex- 
ample ^] to see that the coefficients of • 1 /q and p^^ ■ p^^ form holonomic 
sequences of depth D < 0{d^) and degrees K < 0{d^) and K < 0{d^), respec- 
tively. Then apply Theorem^). □ 

The proof of Proposition^] uses the following extension of Observation |SJ 

Observation 19 a) Let A = (aij) e !F[X]''^'' denote a regular k x k-matrix of 
rational functions in one variable X with both numerator and denominator 
of aij {X) polynomials of degree at most d. Then the rational functions which 
consists of have degree at most dk^ . 
b) If furthermore the denominators in A are identical for each column, that is, 

aij{X) = pij(X)/qj{X); then A^^ has degree at most dk. 
The entries of A~^ can in fact be achieved to have a common denominator while 
still observing the above degree bounds. 
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Proof (Observation^^. By multiplying each column with the denominators it 
contains, Claim a) immediately reduces with d — dk to h). For the latter, exploit 
fc-linearity of the determinant in order to obtain Y[j=i Iji^) of degree< dk as 
the denominator of det(A) with numerator P J2aeSk ^S^i'^) Y['j=iPcr{j),j 
of degree at most dk as well. Now has entries det{Aji)/ det(yl) based on 
Cramer's Rule with Aij a sub-matrix of A and its determinant thus a rational 
function of degree at most d{k—l). More precisely, the denominator of det{Aijg) 
is rij^jo Ij^-^^ ^^"^ ilaxs cancels out in det(Ay )/ det(^) against the denominator 
of det(j4), leaving P as common denominator of all entries in A^^. □ 

Proof (Proposition^^. By prerequisite, the fc-shifted sequence (Pfe+„)^ is a 
linear combination of the original (i.e., 0-shifted), the 1-shifted, . . . , and (fc — 1)- 
shifted one; a linear combination with coefficients being rational functions of 
degree at most d and common denominator. By induction on to, also the (fc+TO)- 
shifted sequence is a linear combination of the first k shifts — this time with 
coefficients of degree at most md and common denominator. 

In particular, the vector space U (over the field T{N) of rational functions in 
N) formed by all shifts of (Pn)„ has dimension at most k] similarly, the shifts of 
(Qra)„ give rise to a vector space V of dimension at most t. Therefore, the vector 
space U + V oi all joint shifts is at most {k + i?)-dimensional, that is, latest the 
(k + ^)-shift of (P„ + (5n)„ is a linear combination of its predecessors: closure of 
holonomic sequences under addition at depth at most k -\- (.. 

In order to estimate the degree of the rational coefficients involved in the 
latter linear combination, express each of the first k + 1+1 shifts of (P„ + (3ri)„ 
as linear combinations of the first k shifts of (-P„)„ and the first I shifts of ((5n)„- 
By the above remark, this gives rise to a {k + 1) x (fc + f + l)-matrix B over 
T{N) with entries of degree at most d ■ max{fc,^} and in each column at most 
two different denominators — which is easy to turn into degree D < d ■ (k + £) 
with column- wise single common denominators. The k + £ + I columns of B 
are linearly dependent, either by the above considerations or simply due to its 
format. 

Suppose for simplicity that the first K :— k + £ columns are independent — 
otherwise we argue similarly to obtain an even shorter and lower-degree recur- 
rence. Expressing the {k + £+ l)-st column by these first ones yields an explicit 
representation of the (fc -I- £)-shift of (P„ -|- Qn)^ in terms of its first k + £ shifts. 
Denoting by b the last column of B and by A its first k + £ columns, we obtain as 
the coefficients of this representation the vector A~^ - b which, by virtue of Obser- 
vationlTi^l consists of rational functions of degree 0{DK) = 0(^d{k + £)'^) with 
common denominator. We have thus arrived at the desired holonomic recurrence 

^ for (p„ + g„)„. 

For {Pn-Qn)^, consider the tensor product vector space U^V of dimension at 
most k-£to obtain a recurrence of the claimed depth. A generator of U(E)V is the 
collection of mixedly-shifted product sequences {Pn+i-Qn+j)„ with < i < k and 
< j < £■ Therefore, each of the K := k-£ (singly but farther) shifted sequences 
(P„_l_m • Qn+m)„, < m < K , cau be expressed as a linear combination of this 
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generator; in fact with coefBcients being rational functions of degree D < dK 
with common denominator. Putting them into a {K-\-l) x i^-matrix and arguing 
as above, we obtain a representation of the iC-shifted sequence {Pn+K ■ Qn+K)„ 
as Hnear combination of the m-shifts, Q <m < K, with coefficients being rational 
functions of degree 0{DK) = OidPf). 

The proof for convolution proceeds similarly. □ 



References 

1. B.K. Alpert, V. ROKHLIN: "A Fast Algorithm for the Evaluation of Legendre 
Expansions", pp.158-179 in SIAMJ. Sci. Stat. Comput. vol.l2:l (1991). 

2. M. DE Berg, M. van Kreveld, M. Overmars, O. Schwarzkopf: "Computa- 
tional Geometry", Springer (1997). 

3. A. BoSTAN, P. Gaudry, E. Schost: "Linear Recurrences with Polynomial Coef- 
ficients and Computation of the Carticr-Manin Operator on Hyperelliptic Curves" , 
pp. 40-58 in Proc. 7th Conference on Finite Fields and Applications, Springer LNCS 
vol.2948 (2003). 

4. A. Bostan, P. Gaudry, E. Schost: "Linear Recurrences with Polynomial Co- 
efficients and Application to Integer Factorization and Cartier-Manin Operator", 
preprint submitted. 

5. A. Bostan, G. Lecerf, E. Schost: "Tellegen's Principle into Practice", pp.37-44 
in Proc. ISSAC, ACM Press (2003). 

6. R.P. Brent, H.T. Kung: "Fast Algorithms for Manipulating Formal Power Se- 
ries", pp.581-595 in J. ACM vol.25:4 (1978). 

7. P. BuRGissER, M. Clausen, M.A. Shokrollahi: "Algebraic Complexity The- 
ory", Springer (1997). 

8. Cheng, Qi: "On the Ultimate Complexity of Factorials" , pp. 157-166 in Proc. 20th 
Annual Symposium on Theoretical Aspects of Computer Science (STACS'2003), 
Springer LNCS vol.2607. 

9. D.V. Chudnovsky, G.V. Chudnovsky: "Approximations and complex multipli- 
cation according to Ramanujan", pp. 375-472 in Ramanujan revisited, Academic 
Press (1988). 

10. D. Coppersmith, S. Winograd: "Matrix Multiplication via Arithmetic Progres- 
sions", pp. 251-280 in J. Symbolic Computation vol.9 (1990). 

11. R.J. Fateman: "Lookup Tables, Recurrences and Complexity", pp. 68-73 in Proc. 
ACM Symp. Symbolic and Algebraic Computation (ISSAC'89). 

12. CM. FiDUCCiA: "An Efficient Formula for Linear Recurrences", pp. 106-112 in 
SIAM J. Comput. vol. 14:1 (1985). 

13. J. von zur Gathen, J. Gerhard: "Modem Computer Algebra" (2nd Edition), 
Cambridge University Press (2003). 

14. J. Gerhard: "Modular Algorithms in Symbolic Summation and Symbolic Inte- 
gration", Springer LNCS vol.3218 (2004). 

15. R.L. Graham, D.E. Knuth, O. Patashnik: "Concrete Mathematics: A Founda- 
tion for Computer Science" , Addison- Wesley. 

16. X. Huang, V.Y. Pan: "Fast Rectangular Matrix Multiplication and Applicar 
tions", pp. 257-299 in J. Complexity vol.14 (1998). 

17. E.S. Key, H. Volkmer: "Eigenvalue Multiplicities of Products of Companion 
Matrices", pp. 103-1 14 in Electronic Journal of Linear Algebra vol.11 (2004). 



Fast (Multi-)Evaluation of Linearly Recurrent Sequences 



19 



18. W. KOEPF: "Efficient Computation of Chebyshev Polynomials in Computer Al- 
gebra", pp. 79-99 in Computer Algebra Systems: A Practical Guide, John Wiley 
(1999). 

19. W. KoEPF, D. ScHMERSAU: "Recurrence equations and their classical orthogo- 
nal polynomial solutions", pp. 303-327 in Applied Mathematics and Computation 
vol.128 (2002). 

20. W. KOEPF: "The Algebra of Holonomic Equations", pp. 173-194 in Mathematische 
Semesterberichte vol.44 (1997). 

21. P. KoiRAN: "Valiant's Model and the Cost of Computing Integers", pp.131-146 in 

Computational Complexity vol.13 (2004). 

22. A.F. NiKlFOROV, V.B. Uvarov: "Special Functions of Mathematical Physics", 
Birkhauser (1988). 

23. V.Y. Pan: "Structured Matrices and Polynomials" , Birkhauser (2001). 

24. M.S. Paterson, L.J. Stockmeyer: "On the Number of Nonscalar Multiplications 
Necessary to Evaluate Polynomials", pp.60 66 in SIAM J. Comp. vol.2 (1973). 

25. B. Salvy, p. Zimmermann; "GFUN: A Maple Package for the Manipulation of Gen- 
erating and Holonomic Functions in One Variable", pp. 163-166 in ACM Transac- 
tions on Mathematical Software vol. 20:2 (1994). 

26. V. Shoup: "Efficient Computation of Minimal Polynomials in Algebraic Extensions 
of Finite Fields", pp.53 58 in Proc. ISSAC'99, ACM Press (1999). 

27. V. Strassen: "Einige Resultate iiber Berechnungskomplexitat" , pp. 1-8 in Jahres- 
berichte Deutsch. Math.-Verein. vol.78:l (1976/77). 

28. F.G. Tricomi: "Vorlesungen iiber Orthogonalreihen" (2nd Edition), Springer 
(1970). 



